436 research outputs found

    Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition

    Full text link
    A major emerging challenge is how to protect people's privacy as cameras and computer vision are increasingly integrated into our daily lives, including in smart devices inside homes. A potential solution is to capture and record just the minimum amount of information needed to perform a task of interest. In this paper, we propose a fully-coupled two-stream spatiotemporal architecture for reliable human action recognition on extremely low resolution (e.g., 12x16 pixel) videos. We provide an efficient method to extract spatial and temporal features and to aggregate them into a robust feature representation for an entire action video sequence. We also consider how to incorporate high resolution videos during training in order to build better low resolution action recognition models. We evaluate on two publicly-available datasets, showing significant improvements over the state-of-the-art.Comment: 9 pagers, 5 figures, published in WACV 201

    Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems

    Full text link
    Predicting the future location of vehicles is essential for safety-critical applications such as advanced driver assistance systems (ADAS) and autonomous driving. This paper introduces a novel approach to simultaneously predict both the location and scale of target vehicles in the first-person (egocentric) view of an ego-vehicle. We present a multi-stream recurrent neural network (RNN) encoder-decoder model that separately captures both object location and scale and pixel-level observations for future vehicle localization. We show that incorporating dense optical flow improves prediction results significantly since it captures information about motion as well as appearance change. We also find that explicitly modeling future motion of the ego-vehicle improves the prediction accuracy, which could be especially beneficial in intelligent and automated vehicles that have motion planning capability. To evaluate the performance of our approach, we present a new dataset of first-person videos collected from a variety of scenarios at road intersections, which are particularly challenging moments for prediction because vehicle trajectories are diverse and dynamic.Comment: To appear on ICRA 201

    Recurrent violent injury: magnitude, risk factors, and opportunities for intervention from a statewide analysis.

    Get PDF
    INTRODUCTION: Although preventing recurrent violent injury is an important component of a public health approach to interpersonal violence and a common focus of violence intervention programs, the true incidence of recurrent violent injury is unknown. Prior studies have reported recurrence rates from 0.8% to 44%, and risk factors for recurrence are not well established. METHODS: We used a statewide, all-payer database to perform a retrospective cohort study of emergency department visits for injury due to interpersonal violence in Florida, following up patients injured in 2010 for recurrence through 2012. We assessed risk factors for recurrence with multivariable logistic regression and estimated time to recurrence with the Kaplan-Meier method. We tabulated hospital charges and costs for index and recurrent visits. RESULTS: Of 53 908 patients presenting for violent injury in 2010, 11.1% had a recurrent violent injury during the study period. Trauma centers treated 31.8%, including 55.9% of severe injuries. Among recurrers, 58.9% went to a different hospital for their second injury. Low income, homelessness, Medicaid or uninsurance, and black race were associated with increased odds of recurrence. Patients with visits for mental and behavioral health and unintentional injury also had increased odds of recurrence. Index injuries accounted for 105millionincosts,andrecurrentinjuriesaccountedforanother105 million in costs, and recurrent injuries accounted for another 25.3 million. CONCLUSIONS: Recurrent violent injury is a common and costly phenomenon, and effective violence prevention programs are needed. Prevention must include the nontrauma centers where many patients seek care

    Predicting Geo-informative Attributes in Large-Scale Image Collections Using Convolutional Neural Networks

    Full text link
    Geographic location is a powerful property for or-ganizing large-scale photo collections, but only a small fraction of online photos are geo-tagged. Most work in automatically estimating geo-tags from image content is based on comparison against models of buildings or land-marks, or on matching to large reference collections of geo-tagged images. These approaches work well for frequently-photographed places like major cities and tourist destina-tions, but fail for photos taken in sparsely photographed places where few reference photos exist. Here we consider how to recognize general geo-informative attributes of a photo, e.g. the elevation gradient, population density, de-mographics, etc. of where it was taken, instead of trying to estimate a precise geo-tag. We learn models for these attributes using a large (noisy) set of geo-tagged images from Flickr by training deep convolutional neural networks (CNNs). We evaluate on over a dozen attributes, showing that while automatically recognizing some attributes is very difficult, others can be automatically estimated with about the same accuracy as a human. 1

    Identifying First-person Camera Wearers in Third-person Videos

    Full text link
    We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first- and third-person videos, which is challenging because the camera wearer is not visible from his/her own egocentric video, preventing the use of direct feature matching. In this paper, we propose a new semi-Siamese Convolutional Neural Network architecture to address this novel challenge. We formulate the problem as learning a joint embedding space for first- and third-person videos that considers both spatial- and motion-domain cues. A new triplet loss function is designed to minimize the distance between correct first- and third-person matches while maximizing the distance between incorrect ones. This end-to-end approach performs significantly better than several baselines, in part by learning the first- and third-person features optimized for matching jointly with the distance measure itself
    • …
    corecore